DNA Research — Latest Matching Preprints

1

Chromosome-level genome assembly of Calotes wangi with dynamic colour variation

Qiu, X.; Wang, Y.; Wen, J.; Chen, Y.; Zhao, L.; Jian, J.; Yang, W.

2026-07-10 evolutionary biology 10.64898/2026.07.07.736949 medRxiv

Top 0.1%

4.2%

Show abstract

The Wangs garden lizard, Calotes wangi, is a widely distributed agamid species in Southern China and Northern Vietnam and exhibits pronounced colour variation and rapid body colour change. Despite increasing interest in the genomic basis of colour variation, chromosome-level genomic resources remain limited in agamid lizards. Here, we generated a chromosome-level reference genome of C. wangi using PacBio HiFi sequencing and Hi-C scaffolding. The final genome assembly was approximately 1.66 Gb in size and comprised 6 macrochromosomes and 11 microchromosomes, with a contig N50 of 110.09 Mb and 98.9% complete BUSCO genes. A total of 20,442 protein-coding genes were annotated. Comparative genomic analyses identified 297 significantly expanded gene families, with enriched functions associated with steroid metabolism, chromatin regulation, and epigenetic processes. This high-quality genome assembly provides an important genomic resource for future studies of colour variation, phenotypic plasticity, and evolutionary diversification in agamid lizards.

2

A Highly Contiguous Reference Genome for Scalesia gordilloi (Asteraceae), a Critically Endangered Plant Endemic to the Galapagos Islands

Pozo, G.; Rivas-Torres, G.; Velez-Darquea, E.; Barragan-Orbe, D.; Torres, M. d. L.

2026-06-29 genomics 10.64898/2026.06.25.734018 medRxiv

Top 0.2%

1.9%

Show abstract

Scalesia gordilloi is a critically endangered species endemic to San Cristobal Island in the Galapagos archipelago and represents one of the most unique and vulnerable lineages within the adaptive radiation of the genus Scalesia. Despite its evolutionary distinctiveness and conservation importance, no genomic resources have been available for this species. Here, we present the first high-quality reference genome of S. gordilloi, generated using Oxford Nanopore long-read sequencing. Across three PromethION R10.4.1 flow cells, we obtained 80.5 Gb of long reads (~25X coverage), which enabled a highly contiguous 3.61 Gb assembly composed of only 549 contigs and an N50 of 106.6 Mb. BUSCO completeness reached 98.6%, with assembly metrics comparable to other high-quality Asteraceae genomes. Repeat annotation revealed that 76.2% of the genome is composed of interspersed elements, dominated by LTR retrotransposons. Structural annotation resulted in 47,913 high-confidence protein-coding genes, consistent with expectations for large, repetitive Asteraceae genomes. This genome provides a critical foundation for conservation genomics, enabling assessments of genetic diversity, inbreeding, and adaptive potential in the species. It further establishes a framework for comparative genomics across the Scalesia radiation and supports future efforts to protect and restore one of the most threatened plant lineages of the Galapagos Islands.

3

A Draft Male Genome Assembly of the Slipper Lobster (Thenus australiensis) Reveals an XY System and a Validated Diagnostic Marker for Monosex Aquaculture.

Tran Nguyen, A. H.; Ha, G.-H.; Tran, D.-P.; Le, N. T.; Glendining, S.; Fitzgibbon, Q.; Herzig, V.; Luu, P.-L.; Ventura, T.

2026-06-29 genomics 10.64898/2026.06.24.734161 medRxiv

Top 0.2%

1.9%

Show abstract

The slipper lobster (Thenus australiensis) is rapidly emerging as a high-potential species for commercial aquaculture. Because females exhibit superior growth characteristics due to less frequent moulting after sexual maturity, developing monosex breeding strategies is highly desirable for industry profitability. However, the lack of genomic resources and early sex-identification tools has hindered this development. Here, we report the first draft male genome assembly for T. australiensis, generated using a combination of whole-genome shotgun sequencing, DArT-seq, and multi-tissue transcriptomics. The curated assembly spans 0.913 Gbp with high functional completeness (93.0% BUSCO), providing a robust repertoire of 30,100 protein-coding genes. Through k-mer subtraction and population-level DArT-seq genotyping, we provide definitive evidence that T. australiensis utilizes an XX/XY sex-determination system. Crucially, by identifying male-specific structural variations within a neo-Y locus, we developed a diagnostic PCR assay targeting a male-exclusive sequence. This 171 bp marker achieved 100% accuracy in phenotypic sex identification across wild-caught populations. Ultimately, these foundational genomic resources, combined with a highly reliable molecular sexing tool, provide the critical framework necessary for early sex sorting, broodstock management, and the commercial advancement of monosex slipper lobster farming.

4

Evolutionary dynamics of Aegilops revealed through comparative genome assembly of all 25 species

Shazadee, H.; Edwards, T.; Levesque-Lemay, M.; Zheng, C.; Ens, J.; Pozniak, C. J.; You, F. M.; Cloutier, S.

2026-07-10 genomics 10.64898/2026.07.09.737531 medRxiv

Top 0.2%

1.5%

Show abstract

Aegilops species are the closest wild relatives of wheat and an important reservoir of genetic diversity for its improvement. Despite their potential, many Aegilops genomes remain poorly characterized. Here we present high-quality assemblies of 18 diploid, tetraploid, and hexaploid Aegilops genomes, which, along with the previously published genomes, complete the production of reference assemblies for all 25 genomes in this genus. Assembly sizes ranged from 5.24 Gb in diploids to 12.65 Gb in hexaploids, with scaffold N50 values up to 749.2 Mb. Gene annotation identified 53,035-156,779 protein-coding genes, of which 21,865-60,490 were classified as high-confidence. Orthogroup-based pangenome analysis across the 25 Aegilops genomes identified 80,521 orthogroups, including 15,809 core, 61,735 dispensable, and 2,977 species-specific orthogroups, highlighting substantial gene content variation among genomes. Phylogenetic analysis of 63 Triticum and Aegilops genomes/subgenomes based on near single-copy orthologs defines the phylogenetic relationships within the Triticum/Aegilops complex and confirms diploid progenitors of polyploid lineages. Ae. mutica (T) and Ae. speltoides (S) belong to the B lineage while the remaining Sitopsis grouped within the D lineage. Structural variation analyses using diploid progenitors as references revealed extensive large-scale rearrangements following polyploidization, emphasizing the dynamics of their evolution. Transposable element (TE) annotation further highlighted subgenome-specific TE expansions and contractions, providing insights into the mechanisms shaping genome structure after polyploidization. Collectively, these genomic resources provide a comprehensive framework for exploring Aegilops diversity, understanding polyploid evolution, and accelerating wheat improvement.

5

Gene model for the ortholog of DENR in Drosophila eugracilis

Lawson, M. E.; Sanow, K. A.; Martinand, I.; Fratian, M.; Matura, M.; Rele, C. P.; Reed, L. K.; Thompson, J. S.; O'Rourke, K. S.

2026-06-26 genomics 10.64898/2026.06.23.734050 medRxiv

Top 0.2%

1.5%

Show abstract

Gene model for the ortholog of Density regulated protein (DENR) in the Apr. 2013 (BCM-HGSC/Deug_2.0) (DeugGB2) Genome Assembly (GenBank Accession: GCA_000236325.2) of D. eugracilis. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

6

A chromosome-level reference genome of the largest cervid species - the European moose (Alces alces; Linnaeus, 1758)

Torresen, O. K.; Mysterud, A.; Skage, M.; Danneels, B.; Strand, M. A.; Ferrari, G.; Tooming-Klunderud, A.; Jakobsen, K. S.

2026-07-08 genomics 10.64898/2026.07.03.736352 medRxiv

Top 0.2%

1.5%

Show abstract

We describe a chromosome-level, haplotype-resolved genome assembly from a male European moose (Alces alces alces). The assembly comprises two pseudo-haplotypes of 3,148 Mb and 3,112 Mb, with sex chromosomes in haplotype one, and 33 autosomes in each haplotype (68 in total). Assembly completeness is high (BUSCO 98.3% and 95.7%), with 21,496 and 20,498 annotated protein-coding genes for haplotypes one and two, respectively. This genome assembly is the most complete so far generated for European moose.

7

An integrative single-cell and spatial transcriptomics atlas highlights candidate regulatory factors in the development of gerbera capitulum

Gao, Y.; Li, F.; Jin, C.; de Ridder, D.; Immink, R.; Sun, Y.; Hu, P.; Cao, Y.; Shao, H.; van Dijk, A. D. J.; Wang, J.

2026-07-10 plant biology 10.64898/2026.07.05.736605 medRxiv

Top 0.2%

1.4%

Show abstract

In Asteraceae species, the capitulum is a compact inflorescence, featuring a characteristic reproductive structure. Despite the identification of a few key regulatory factors, the transcriptome-level information on the developing capitulum remains limited. Here, we applied single-cell and spatial transcriptome sequencing to investigate the developing Gerbera hybridas capitulum during floret differentiation. We obtained a transcriptomics atlas encompassing different stages of the Gerbera capitulum and analyzed the cellular and spatial dynamics of gene expression. Using marker gene expression and GO enrichment of cluster-specific DEGs, we annotated putative cell types and described changes in gene expression across sampled stages, potentially associated with ongoing developmental processes. We detected activity of previously undescribed MADS-box genes and defined their spatial expression patterns. Notably, the MADS-box gene GAGL12 was found to be enriched in the putative capitulum phloem cells. The GAGL12 protein was shown in yeast two-hybrid assays to interact with several other MADS-domain proteins with hypothesized functions in vasculature development, and further detailed in silico analyses supported a candidate role in the development of capitulum vasculature. Altogether, we provide integrative and dynamic transcriptomic insight into capitulum and floret development and lay a basis for future functional studies of the control and development of this intriguing reproductive structure.

8

Chromosome-level Genome Assembly of the Potato Leafhopper Empoasca fabae (Hemiptera: Cicadellidae)

Molligan, J.; Sylvestre, F.; Perez-Lopez, E.

2026-07-09 genomics 10.64898/2026.07.04.736200 medRxiv

Top 0.3%

1.0%

Show abstract

The potato leafhopper, Empoasca fabae (Harris, 1841), is a highly polyphagous, migratory insect pest of eastern North America that feeds on more than 200 herbaceous and woody plant species, causing substantial losses to forage and field crops. Despite its agricultural and ecological importance, no genome has been available for this species. Here, we present the first chromosome-level genome assembly of E. fabae, generated from Oxford Nanopore long reads, Illumina short reads, and Omni-C proximity-ligation data. The final assembly spans 908 Mb across 132 scaffolds, with 99.8% of the assembly captured in ten chromosome-length scaffolds (nine autosomes and an X chromosome) with a scaffold N50 of 96.2 Mb. The assembly is highly complete, recovering 92.4% of conserved hemipteran single-copy orthologs, and is composed of 47.6% repetitive sequence, dominated by long terminal repeat retrotransposons and unclassified elements. Read-depth comparison between male and female individuals supports assignment of a single sex-linked chromosome, consistent with an XO sex-determination system. BRAKER3 gene annotation predicted 31,406 protein-coding genes after retaining the longest isoform per locus. Comparative genome analysis against the two closest related Typhlocybinae species with genomes available, Matsumurasca onukii and Hebata decipiens, revealed extensive chromosome-scale collinearity, while defining a shared core gene repertoire. This reference genome provides a foundation for comparative and population genomic studies and for investigating genetic traits in this economically important crop pest species.

9

Comprehensive transcriptome data of melittin- and un-treated murine hepatoma Hepa 1-6 cells

Zhang, R.;Zhang, Y.;Zang, H.;Lou, J.;Li, Y.;Jiang, J.;Chen, D.;Yan, T.;Guo, R.

2026-06-30 Cancer Biology 10.64898/2026.06.25.734412 medRxiv

Top 0.3%

0.9%

Show abstract

Melittin, the principal bioactive peptide of bee venom, exerts potent antitumor activity against hepatocellular carcinoma (HCC). However, the comprehensive transcriptomic alterations it elicits in hepatoma cells remain poorly characterized. Here, we present an integrated transcriptome dataset from melittin- and un-treated murine Hepa 1-6 hepatoma cells, encompassing messenger RNA (mRNA) and microRNA (miRNA) expression profiles. Cells were exposed to 4 g/mL melittin in serum-free DMEM for 20 min, and total RNA was subjected to ribosomal RNA-depleted strand-specific RNA sequencing on an Illumina NovaSeq6000 platform (paired-end 150 bp) and small RNA sequencing on an Illumina HiSeq2500 platform (single-end 50 bp). Raw data were processed using Cutadapt to remove adapters and low-quality reads, yielding clean datasets with Q20 [≥] 99.85%, Q30 [≥] 98.48%, and valid data ratios exceeding 85%. All raw and processed sequencing data are publicly available. This transcriptomic resource provides a valuable resource and basis for elucidating the regulatory networks underlying melittin-induced anti-hepatoma effects. DatasetThe dataset can be accessed through the National Genomics Data Center, China National Center website by searching with the BioProject accession number PRJCA065485 Reviewers may use this link for anonymous access during the review process. Direct URL to data: Genome Sequence Archive-CNCB-NGDC. Dataset LicenseCC BY 4.0

10

Gene model for the ortholog of raptor in Drosophila grimshawi

Lieser, B. C.; Lose, B.; Kiser, C. A.; Butterfield, S.; Laschober, L.; Laskowski, L. F.; Nielsen, J.; Pulford, J.; Thompson, J. S.; Rele, C. P.; Wittke-Thompson, J. K.

2026-07-11 genomics 10.64898/2026.07.07.737051 medRxiv

Top 0.4%

0.8%

Show abstract

Gene model for the ortholog of raptor in the D. grimshawi May 2011 (Agencourt dgri_caf1/DgriCAF1) Genome Assembly (GenBank Accession: GCA_000005155.1) of Drosophila grimshawi. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

11

Gene model for the ortholog of raptor in Drosophila erecta

Backlund, A. E.; Nielsen, J.; Pulford, J.; Cook, B.; Anderson, J.; Robert, M.; Thompson, J. S.; Rele, C. P.; Wittke-Thompson, J. K.

2026-07-14 genomics 10.64898/2026.07.09.737526 medRxiv

Top 0.4%

0.8%

Show abstract

Gene model for the ortholog of raptor in the May 2011 (Agencourt Dere_CAF1/DereCAF1) Genome Assembly (GenBank Accession: GCA_000005135.1) of Drosophila erecta. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

12

A first pangenomic framework for globe artichoke supports SNP-based varietal fingerprinting

Portis, E.;Vergnano, E.;Gaccione, L.;Acquadro, A.;Comino, C.;Carli, C.;Barchi, L.;Martina, M.

2026-06-26 Plant Biology 10.64898/2026.06.25.734495 medRxiv

Top 0.4%

0.6%

Show abstract

Globe artichoke (Cynara cardunculus var. scolymus L.) comprises a broad range of local ecotypes and varietal groups whose genetic diversity has been investigated through different molecular markers. However, recent advances in next-generation sequencing and pangenomics approaches provide new opportunities to capture genome-wide variation at higher resolution and to develop practical tools for varietal discrimination, traceability, and germplasm conservation. In this study, we developed the first pangenomic framework for cultivated artichoke and evaluated pangenome-informed SNP markers for varietal fingerprinting. Whole-genome resequencing data from the Italian local ecotype Asti Sori were integrated with publicly available genomic data from representative globe artichoke and cultivated cardoon accessions to construct and annotate a pangenome. Genome-wide SNP and presence/absence variation (PAV) analyses were combined with pangenome-anchored genotyping-by-sequencing (GBS) data from 45 accessions representing the main cultivated varietal groups. The pangenome revealed a largely conserved core gene repertoire alongside a smaller accessory component, with gene accumulation curves suggesting a tendency toward saturation within the sampled cultivated germplasm. SNP- and PAV-based analyses provided complementary views of accession relationships and consistently resolved the principal cultivated groups. Across the broader germplasm panel, pangenome-anchored GBS-derived SNPs identified well-supported phylogenetic clusters corresponding to recognized varietal types. A reduced panel of 50 SNPs, selected through iterative random subsampling, retained at least 90% of the genetic diversity captured by the full dataset and reproduced its main population structure. This compact pangenome-anchored marker set provides a practical foundation for varietal fingerprinting, DUS-oriented applications, traceability, and conservation of traditional globe artichoke germplasm. Validation across independent collections will be required before routine deployment.

13

Evolutionary Stratification of Codon Usage Bias In Plants Arises from GC3 Composition and Translational Optimization

Mohanta, T. K.

2026-07-01 genomics 10.64898/2026.06.26.734692 medRxiv

Top 0.4%

0.6%

Show abstract

Codon usage bias is a fundamental genomic characteristic that prefers non-random preferential use of synonymous codons. It is a major determinant of translational efficiency, gene regulation, and molecular evolution. However, the evolutionary bias and functional relevance of codon usage bias across the plant lineage is poorly defined and yet to understand what are the major factors responsible for relative synonymous codon usage (RSCU) in genomes and how codon usage bias influences the gene regulation, molecular evolution genomes. A genome-wide codon usage bias study of coding DNA sequences of 262 plant genome was conducted. It encompassed more than 4.6 billion codons from > 11 million coding sequences. Relative synonymous codon usage, codon adaptation index, codon-anticodon mapping, effective number of codon (ENC)-GC3, GC1,2-GC3, parity rule 2 (PR2-bias), molecular economy, and machine learning approaches were used for the study. It was found that codon usage bias was strongly non-random and exhibited a clear phylogenetic structuring. The higher plants favoured A/T-ending, whereas early-diverging lineages were enriched in G/C-ending codons. Analysis of RSCU, codon adaptation index, and codon-anticodon pairing indicated that translational selection is mediated by tRNA availability, contributing sustainability to these molecular patterns. Machine-learning approaches identified a small subset of codons having outsized influence on genome-wide codon usage landscapes. Further studies revealed the presence of robust inverse relationships between the effective number of codons and GC content at synonymous third positions. Neutrality analysis revealed approximately 61% of variation was driven by mutational pressure, tempered by selective constraints. Phylogenetic reconstruction showed a progressive relaxation of codon bias from algae to angiosperms while maintaining a conserved molecular economy cost of ~ 30 ATP per codon across the lineages. The study revealed codon usage bias is lineage-specific evolutionary conserved trait governed by mutation, selection, and translational optimization.

14

Low-molecular-weight Ulva lacinulata extract exhibiting anti-inflammatory and pro-autophagic activities in RAW 264.7 macrophages: a promising candidate for the development of active ingredients targeting low-grade inflammation

Cherfan, J.; Heerah, D.; Bodet, P.-E.; Musnier, B.; Saliba, J.; Sulpice, R.; Bodin, J.; Dufour, D.; Fioramonti, X.; Dinel, A.-L.; Joffre, C.; Delmarre, P.; Le Faouder, J.; Bouvret, E.; Arnaudin, I.; Maugard, T.; Bridiau, N.

2026-07-08 biochemistry 10.64898/2026.07.07.734444 medRxiv

Top 0.5%

0.5%

Show abstract

Marine macroalgae are valuable sources of bioactive compounds. In this study, we thus investigated the chemical composition and biological activity of an extract from the green seaweed Ulva lacinulata, composed of small bioactive compounds. Comprehensive compositional analyses and high-resolution mass spectrometry revealed its diverse molecular profile composed in particular of peptides/amino acid derivatives, saccharides, low-chain fatty diacids, oxylipins and minerals. Its anti-inflammatory activity was assessed after 6 h pre-treatment in LPS-stimulated cultured RAW 264.7 macrophages, showing that it significantly and dose-dependently reduced the expression and/or secretion of pro-inflammatory cytokines such as TNF-alpha; and IL-6, and targeted the NF-kB signaling cascade. It modulated the SIRT1-AMPK signaling axis and increased the LC3-II/LC3-I ratio, supporting the activation of a controlled autophagic response. This work highlighted the potential of this marine-derived extract as a safe and effective functional ingredient for the development of functional food and/or dietary supplements targeting chronic low-grade inflammation.

15

Haplotypes variations of yellow stripe like (TaYSL) genes are associated with grain iron and zinc contents in wheat (Triticum aestivum L.)

Abbasi, K.; Qayyum, H.; Naseer, S.; Sun, M.; Quraishi, M. A.; Danyal, Y.; Hao, Y.; He, Z.; Rasheed, A.

2026-07-08 plant biology 10.64898/2026.06.17.732851 medRxiv

Top 0.5%

0.5%

Show abstract

The availability of pangenome and resequencing of wheat collections have facilitated the discovery of gene-trait associations in wheat. Yellow stripe-like (YSL) proteins play a key role in the uptake and translocation of metals and yet have not been fully identified and analyzed at the genome-wide level in wheat. In this study, 26 TaYSL genes were identified and divided into four distinct clades, each clade sharing similar domains and motif compositions. Most genes were upregulated under iron deficiency, whereas homoeologs of TaYSL1 were downregulated. Both SNP-based and haplotype-based association studies were used to dissect the role of TaYSLs underpinning grain iron contents (GFeC) and zinc contents (GZnC) in wheat. TaYSL6-2B and TaYSL16-1A haplotypes showed strong association with GFeC, and TaYSL14-6A showed strong association with GZnC in multiple field trials. The distribution of favorable haplotypes in global wheat collection of [~]3000 accessions showed that majority of haplotypes were more prevalent in landraces and winter wheat compared to modern cultivars and spring types, indicating their potential for use in breeding. The combination of favorable haplotypes of three YSL genes associated with GFeC and GZnC were very rare, and most of the wheat accessions has single or double favorable haplotypes. These findings provide the first comprehensive characterization of the TaYSL gene family in wheat and identify significant SNPs and elite haplotypes that can be utilized for genetic improvement and biofortification.

16

The contribution of recent and historical demographic histories to genomic diversity and conservation status in plant species

Tao, T.; Li, P.; Zhu, Y.; Zhang, S.; Zhang, M.; Lascoux, M.; Chen, J.

2026-06-29 evolutionary biology 10.64898/2026.06.24.734111 medRxiv

Top 0.5%

0.5%

Show abstract

Demographic factors are intrinsically crucial to evaluate species' extinction risk. However, measuring them remains difficult and time-consuming and the use of genomic summary statistics has been advocated to assess the conservation status of a species. In the present study, we estimated (i) the census number (Nc), (ii) effective population size (Ne) over three different time periods, recent, historical and ancient, (iii) neutral genetic diversity ({pi}4), and (iv) a measure of the efficacy of purifying selection ({pi}0/{pi}4) for 101 plant species using population genomic sequencing data. Twenty-one species are from the Plant Species with Extremely Small Populations (PSESP) program of SW China. Threatened species exhibited significantly lower Ne, Nc, {pi}4, and weaker purifying selection, but had a higher Ne/Nc ratio than non-threatened ones. Nc was the main determinant in identifying conservation status, and contemporary neutral genetic diversity was predominantly influenced by historical Ne. In the absence of demographic information, genetic parameters are a good proxy of conservation status, likely because currently threatened species also had a low historical population size. In summary, our findings suggest that direct estimates of Nc are more useful than {pi}4, although the latter remains a valuable conservation indicator. Hence, efforts such as the PSESP should be extended.

17

Near-Gapless and Haplotype-Resolved Capsella Genomes Enable Investigation into Genomic Consequences of Mating System Shifts

Chen, H.; Emmerson, R.; Mosher, R. A.

2026-07-10 plant biology 10.64898/2026.07.10.737683 medRxiv

Top 0.6%

0.5%

Show abstract

The shift from outcrossing to self-fertilization is a common evolutionary transition in flowering plants. The genus Capsella, comprising the obligate outcrosser C. grandiflora and two self-fertile species, C. rubella and C. orientalis, provides a powerful system to explore genomic consequences of mating system shifts. Despite its utility, existing genomic resources in Capsella are fragmented, incomplete, and particularly deficient in repetitive genomic regions, hindering the study of transposable element (TE) dynamics and gene annotation. Here, we present high-quality, chromosome-scale, near-gapless genome assemblies for C. grandiflora, C. rubella, and C. orientalis. Leveraging these improved genomes, we created high-quality genomic resources for the Capsella genus by performing comprehensive, de novo annotations of protein-coding genes and TEs. Comparative genomic analysis among these species reveals differences in TE abundance, position, and production of small RNAs. These resources provide an unprecedented opportunity to explore how mating system transitions influence genome architecture, TE behavior, and gene evolution. This research also developed a static online platform for Capsella genomic resources, Capsella Database (CapBase, www.capsella.uk), to support community use of these resources. Our findings advance understanding of the genomic impacts of selfing and establish a robust foundation for future research into genomics, epigenomics, and evolutionary biology within Capsella and related plant systems.

18

Gene model for the ortholog of tgo in Drosophila busckii

Perez, J.; Giunta, A. A.; Wittke-Thompson, J. K.

2026-07-01 genomics 10.64898/2026.06.26.734908 medRxiv

Top 0.6%

0.4%

Show abstract

Gene model for the ortholog of tango (tgo) in the Sep. 2015 (UC Berkeley ASM127793v1/DbusGB1) Genome Assembly (GenBank Accession: GCA_001277935.1) of Drosophila busckii. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

19

A gapless telomere-to-telomere reference genome of Ostreococcus tauri RCC4221 with expanded annotation of medium-sized ncRNAs

Liu, G.; Bousquet, L.; Mayeur, H.; Manirakiza, E.; Daric, V.; Klopp, C.; Noirot, C.; Lopez-Escardo, D.; Grimsley, N. H.; Yau, S.; Krasovec, M.; Echeverria, M.; PIGANEAU, G.

2026-07-14 genomics 10.64898/2026.07.10.737489 medRxiv

Top 0.6%

0.4%

Show abstract

Marine photosynthetic microbes contribute substantially to global primary production, yet many algal lineages still lack reference genomes with the continuity and annotation quality required for fine-scale structural, regulatory and comparative analyses. Ostreococcus tauri, one of the smallest known free-living photosynthetic eukaryotes, has been a model marine picoeukaryote for over two decades. Despite successive improvements to its historical reference genome, previous assemblies retained hundreds of gaps and incomplete genes, hampering high-resolution genomic analyses. Here, we present O. tauri RCC4221 genome version 2026, a telomere-to-telomere assembly of all 20 chromosomes spanning 13.34 Mb with no gaps. This assembly combines PacBio long-read sequencing, Illumina short-read polishing, correction of unresolved regions guided by independent Nanopore-based assemblies. The updated reference supports a curated annotation comprising 7,683 protein-coding genes, 48 tRNA genes, 3 rRNA operons, 116 medium-sized noncoding RNAs, one signal recognition particle RNA and 138 small nucleolar RNAs. It also improves gene-model integrity and recovers candidate coding loci absent from the 2014 reference. Structural analyses resolved the organization of the two atypical low-GC chromosome 2 and 19 that contain duplicated regions that were collapsed or misrepresented in previous assemblies. Finally, bisulfite sequencing and PacBio SMRT sequencing revealed a dual DNA methylation landscape, with CG-context cytosine methylation concentrated in gene bodies and N6-methyladenosine (m6A) enriched at the start codon. The updated O. tauri 2026 assembly provides a complete and curated reference resource for chromosome biology, comparative genomics, epigenomics and RNA biology in a model marine picoeukaryote.

20

Development of a High-throughput in vivo Assay for the Determination of Adenylation Domain Specificities

Praeve, L.; Liu, J.; Zhou, Y.; Lonono Sanchez, O. N.; Wacker, A. B.; Bode, H. B.

2026-07-15 biochemistry 10.64898/2026.07.14.738513 medRxiv

Top 0.7%

0.4%

Show abstract

Natural product synthesis by non-ribosomal peptide synthetases (NRPS) is greatly defined by the substrate selectivity of the adenylation (A) domains. Previous assays for specificity determination were mainly performed in vitro and were requiring protein purification. In this work, we developed - based on NRPS engineering - a novel in vivo assay suitable for high-throughput application named ASCR (A domain screening). Using the recently described XUT fusion sites, A domains and their upstream condensation domains were assembled as di-domains to characterized NRPS model system, which allowed detection of defined tripeptide products via mass spectrometry directly after cell culture extraction. We evaluated the assay by screening in total 54 A domains from five known and seven uncharacterized NRPS, covering a broad range organism taxonomy and GC content of the investigated NRPS-encoding genes. Additionally, we applied the assay to elucidate and confirm the structures of novel cyclic pentapeptides derived from three novel NRPS from Photorhabdus temperata K122.